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Abstract 

A toy detector has been designed to simulate central detectors in reactor neutrino ex- 
periments in the paper. The samples of neutrino events and three major backgrounds 
from the Monte-Carlo simulation of the toy detector are generated in the signal region. 
The Bayesian Neural Networks(BNN) are applied to separate neutrino events from back- 
grounds in reactor neutrino experiments. As a result, the most neutrino events and 
uncorrelated background events in the signal region can be identified with BNN, and the 
part events each of the fast neutron and 8 He/ 9 Li backgrounds in the signal region can be 
identified with BNN. Then, the signal to noise ratio in the signal region is enhanced with 
BNN. The neutrino discrimination increases with the increase of the neutrino rate in the 
training sample. However, the background discriminations decrease with the decrease of 
the background rate in the training sample. 
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1 Introduction 

The main goals of reactor neutrino experiments are to detect z7 e — > u x oscillation 
and precisely measure the mixing angle of neutrino oscillation #13. The experiment 
is designed to detect reactor z7 e 's via the inverse /3-decay reaction 

v e + V — * e+ + n - 

The signature is a delayed coincidence between e + and the neutron captured 
signals. In the paper, only three important sources of backgrounds are taken into 
account and they are the uncorrelated background from natural radioactivity and 
the correlated backgrounds from fast neutrons and 8 He/ 9 Li. The backgrounds 
like the neutrino events consist of two signals, a fast signal and a delay signal. It 
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is vital to separate neutrino events from backgrounds accurately in the reactor 
neutrino experiments. The selection of the neutrino events based on the cuts is a 
methods that the event space is divided into two regions by a hyper-cuboid based 
on the cuts, and the events inside the hyper-cuboid, called the signal region, are 
regarded as neutrino events and the events outside the hyper-cuboid are regarded 
as backgrounds. In fact, the backgrounds in the signal region couldn't be rejected 
by the method. The Bayesian neural networks (BNN)[1] is an algorithm of the 
neural networks trained by Bayesian statistics. It is not only a non-linear 
function as neural networks, but also controls model complexity. So its flexibility 
makes it possible to discover more general relationships in data than the 
traditional statistical methods and its preferring simple models make it possible 
to solve the over-fitting problem better than the general neural networks [2]. BNN 
has been used to particle identification and event reconstruction in the 
experiments of the high energy physics, such as Ref.[3, 4, 5]. In this paper, BNN 
will be applied to discriminate the neutrino events from the background events in 
the signal region in the reactor neutrino experiments. 



2 The Classification with BNN[1, 5] 

The idea of Bayesian neural networks is to regard the process of training a neural 
network as a Bayesian inference. Bayes' theorem is used to assign a posterior 
density to each point, 0, in the parameter space of the neural networks. Each 
point 9 denotes a neural network. In the method of the Bayesian neural network, 
one performs a weighted average over all points in the parameter space of the 
neural network, that is, all neural networks. The methods make use of training 
data (xi,ti), (x2,t 2 ), (x n ,t n ), where tj is the known label associated with data 
Xi. U = 0, 1, ...N — 1, if there are N classes in the problems of classification; Xi 
has P components if there are P factors on which the classification is influenced. 
That is the set of data x = (xi,x 2 , ...,x n ) which corresponds to the set of target 
t — (ti, t 2 , t n ).The posterior density assigned to the point 0, that is, to a neural 
network, is given by Bayes' theorem 
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where data x do not depend on 6, so p (x \ 6) = p (x). We need the likelihood 
p (t | x, 6j and the prior density p (jPj , in order to assign the posterior density 

p (0 I x, tjto a neural network defined by the point 6. p(t | x) is called evidence 
and plays the role of a normalizing constant, so we ignore the evidence. That is, 



Posterior oc Likelihood x Prior 
We consider a class of neural networks defined by the function 
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where 



h / p \ 
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The neural networks have P inputs, a single hidden layer of H hidden nodes and 
a single output. In the particular Bayesian neural networks described here, each 
neural network has the same structure. The parameter Uij and Vj are called the 
weights and aj and b are called the biases. Both sets of parameters are generally 
referred to collectively as the weights of the Bayesian neural networks, 9. y (x, (Pj 
is the probability that an event, (x,t), belongs to the signal. So the likelihood of 
n training events is 



p(t\x,e)=l[y ti (l-y) 1 - ti (5) 

i=i 

where it has been assumed that the events are independent with each other. 

We get the likelihood, meanwhile we need the prior to compute the posterior 
density. But the choice of prior is not obvious. However, experience suggests a 
reasonable class is the priors of Gaussian class centered at zero, which prefers 
smaller rather than larger weights, because smaller weights yield smoother fits to 
data . In the paper, a Gaussian prior is specified for each weight using the 
Bayesian neural networks package of Radford Neal 1 . However, the variance for 
weights belonging to a given group (either input-to-hidden weights (m^), hidden 
-biases (aj), hidden-to-output weights (vj) or output-biases (&)) is chosen to be the 
same: a^, of, of, of, respectively. However, since we don't know, a priori, what 
these variances should be, their values are allowed to vary over a large range, 
while favoring small variances. This is done by assigning each variance a gamma 
prior 



H) "W m 

where z = a~ 2 , and with the mean \i and shape parameter a set to some fixed 
plausible values. The gamma prior is referred to as a hyperprior and the 
parameter of the hyperprior is called a hyperparameter. 

Then, the posterior density, p {& \ x,tj, is gotten according to Eqs. (2), (5) and 
the prior of Gaussian distribution. Given an event with data x', an estimate of 
the probability that it belongs to the signal is given by the weighted average 

1 R. M. Neal, Software for Flexible Bayesian Modeling and Markov Chain Sampling, 
http: / /www. cs.utoronto.ca/~radford/fbm. software, html 



y{x'\x,t) = Jy(x',9)p(6\x,t)d9 (7) 

Currently, the only way to perform the high dimensional integral in Eq. (7) is to 
sample the density p (9 \ x, tj with the Markov Chain Mario Carlo (MCMC) 

method[l, 6, 7, 8]. In the MCMC method, one steps through the 9 parameter 
space in such a way that points are visited with a probability proportional to the 
posterior density, p (0 \ x,tj. Points where p (6 \ x, tj is large will be visited more 

often than points where p (& \ x, tj is small. 

Eq. (7) approximates the integral using the average 



1 L 

y(x'\x,t)^-J2y(x',di) (8) 

where L is the number of points 9 sampled from p (O \ x,tj. Each point 9 
corresponds to a different neural network with the same structure. So the 
average is an average over neural networks, and the probability of the data x' 
belongs to the signal. The average is closer to the real value of y (x' \ x,t), when 
L is sufficiently large. 

3 Toy Detector and Simulation[4] 

3.1 Toy Detector 

In the paper, a toy detector is designed to simulate central detectors in the re- 
actor neutrino experiments, such as Daya Bay experiment [9] and Double Chooz 
experiment [10], with CERN GEANT4 package[ll]. The toy detector consists of 
three regions, and they are the Gd-doped liquid scintillator (Gd-LS from now on), 
the normal liquid scintillator (LS from now on) and the oil buffer, respectively. 
The toy detector of cylindrical shape like the detector modules of Daya Bay ex- 
periment and Double Chooz experiment is designed in the paper. The diameter of 
the Gd-LS region is 2.4 meter, and its height is 2.6 meter. The thickness of the LS 
region is 0.35 meter, and the thickness of the oil part is 0.40 meter. In the paper, 
the Gd-LS and LS are the same as the scintillator adopted by the proposal of the 
CHOOZ experiment [11]. The 8-inch photomultiplier tubes (PMT from now on) 
are mounted on the inside the oil region of the detector. A total of 366 PMTs are 
arranged in 8 rings of 30 PMTs on the lateral surface of the oil region, and in 5 
rings of 24, 18, 12, 6, 3 PMTs on the top and bottom caps. 

3.2 Monte-Carlo Simulation of Toy Detector 

The response of the neutrino and background events deposited in the toy detector 
is simulated with GEANT4. Although the physical properties of the scintillator 
and the oil (their optical attenuation length, refractive index and so on) are wave- 
length dependent, only averages[ll] (such as the optical attenuation length of 
Gd-LS with a uniform value is 8 meter and the one of LS is 20 meter) are used in 



the detector simulation. The program couldn't simulate the real detector response, 
but this won't affect the result of the comparison between BNN and the method 
based on the cuts. 

According to the anti-neutrino interaction in the detector of the reactor neutrino 
experiments[12], the neutrino events are uniformly generated throughout Gd-LS 
region (see Fig. 1). The uncorrelated background events are generated in such a 
way that the fast signal energies are generated on the base of the energy 
distribute of the natural radioactivity in the proposal of the Day Bay 
experiment [9], the energies for the neutron events of the single signal are 
regarded as the delay signal energies, the delay times are uniformly generated 
from 2 /is to 100 /j,s and the positions of the fast signal and the delay signal are 
uniformly generated throughout GD-LS region. The fast neutron events are 
uniformly generated throughout Gd-LS region and their energy are uniformly 
generated from MeV to 50 MeV, therein the events of two signals are regarded 
as the fast neutron backgrounds. Since the behavior of 8 He/ 9 Li decay in the 
detector couldn't be simulated by the Geant4 package, 8 He/ 9 Li events are 
generated in such a way that the fast signal energies are generated on the base of 
the energy distribute of 8 He/ 9 Li in the proposal of the Day Bay experiment [9], 
and the other physical quantities are from fast neutron events in the paper. 

4 Event Reconstruction[4] 

The task of the event reconstruction in the reactor neutrino experiments is to 
reconstruct the energy and the vertex of a signal. The maximum likelihood method 
(MLD) is a standard algorithm of the event reconstruction in the reactor neutrino 
experiments. The likelihood is defined as the joint Poisson probability of observing 
a measured distribution of photoelectrons over the all PMTs for given (E, it) 
coordinates in the detector. The Ref.[13] for the work of the CHOOZ experiment 
shows the method of the reconstruction in detail. 

In the paper, the event reconstruction with the MLD are performed in the 
similar way with the CHOOZ experiment [13], but the detector is different from 
the detector of the CHOOZ experiment, so compared to Ref.[13], there are some 
different points in the paper: 

(1) The detector in the paper consists of three regions, so the path length from 
a signal vertex to the PMTs consist of three parts, and they are the path length 
in Gd-LS region, the one in LS region, and the one in oil region, respectively. 

(2) Considered that not all PMTs in the detector can receive photoelectrons 
when a electron is deposited in the detector, the x 2 equation is modified in 
the paper and different from the one in the CHOOZ experiment, that is, x 2 — 
Y,Nj=oNj + J^Nj^oiNj — Nj + Njlog(^)), where Nj is the number of photoelec- 
trons received by the j-th PMT and Nj is the expected one for the j-th PMT[13]. 

(3) ce x Ntotai and the coordinates of the charge center of gravity for the all 
visible photoelectrons from a signal are regarded as the starting values for the fit 
parameters(-E, of), where N tota i is the total numbers of the visible photoelectrons 
from a signal and ce is the proportionality constant of the energy E, that is, 
E = ce x N total, ce is obtained through fitting N tota is of the 1 MeV electron 
events, and is 235/MeV m P a P er - 

The fast and delay signals of a event in the toy detector are reconstructed using 
MLD, respectively. 



5 Monte-Carlo Sample in Signal Region 

The selections of neutrino events are as follows: 

(1) Positron energy: 1.3 MeV < E e + < 8 MeV; 

(2) Neutron energy: 6 MeV < E n < 10 MeV; 

(3) Neutron delay: 2 ^s < At e + n < 100 /is; 

(4) Relative positron-neutron distance: d e + n < 100 cm. 

A hyper-cuboid in the event space is defined by the selection, and the inside is 
the signal region and the outside is the background region. 39000 events of neutrino 
are generated in the signal region. 11000 events each of uncorrelated background, 
fast neutrons and 8 He/ 9 Li are generated in the signal region, respectively. 

6 Neutrino Discrimination with BNN in Signal Region 

The energies of the fast signal and the delay signal(i? e +, E n ), the delay time of the 
delay signal(At e + n ) and the distance between the fast signal and the delay signal 
(d e + n ) are used as inputs to all neural networks, which have the same structure. In 
the paper, all the networks have the input layer of four inputs, the single hidden 
layer of nine nodes and the output layer of a single output which is just the 
probability that an event belongs to the neutrino event. A Markov chain of neural 
networks is generated using the Bayesian neural networks package of Radford Neal, 
with a training sample consisting of the neutrino events and the backgrounds. One 
thousand iterations, of twenty MCMC steps each, are used. The neural network 
parameters are stored after each iteration, since the correlation between adjacent 
steps is very high. That is, the points in neural network parameter space are 
saved to lessen the correlation after twenty steps here. It is also necessary to 
discard the initial part of the Markov chain because the correlation between the 
initial point of the chain and the points of the part is very high. The initial 
three hundred iterations are discarded here. 3000 events each of the neutrino and 
the three backgrounds are used to test the identification capability of the trained 
BNN. In the paper, the BNNs are trained by the different training samples, which 
consist of the neutrino events and three backgrounds at different rates, since the 
different identification efficiencies are obtained with those BNNs. The results of 
the identification with those BNNs are listed in Tab. 1. 

7 Results and Discussion 

As Tab. 1, the neutrino discrimination increases from 82.6% to 91.2% with the 
increase of the neutrino rate from one second to nine fourteenth in the training 
sample using BNN in signal region. However, the background discriminations 
decrease with the decrease of the background rate in the training sample. The 
uncorrelated background discrimination decrease from 88.2% to 73.6% with the 
decrease of its rate from one sixth to one fourteenth in the training sample. The 
fast neutron background discrimination decreases from 48.5% to 37.6% with the 
decreases its rate from one sixth to one seventh in the training sample. The 8 He/ 9 Li 
background discrimination decreases from 51.8% to 39.9% with the decrease of its 
rate from one sixth to one seventh in the training sample. As a result, the most 
neutrino events and uncorrelated background events in the signal region can be 
identified with BNN, and the part events each of the fast neutron and 8 He/ 9 Li 
backgrounds in the signal region can be identified with BNN. The different signal 



to noise ratios in signal region are obtained with BNNs trained by the training 
samples consisting of neutrino events and background events at different rates in 
the reactor neutrino experiments. In a word, the signal to noise ratio in signal 
region can be enhanced with BNN in the reactor neutrino experiments. 
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Tab. 1: The different identification efficiencies are obtained with the BNNs trained 
by the different training samples, which consist of the neutrino and three 
backgrounds at different rates. The term after ± is the statistical error 
of the identification efficiencies. The 3000 events each of the uncorrelated 
background, fast neutron and 8 He/ 9 Li are regarded as the test sample. 
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(b) Neutron-Capture on Gd Energy 
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(c) Positron-Neutron Distance 
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Fig. 1: The neutrino events for the Monte-Carlo simulation of the toy detector are 
uniformly generated throughout Gd-LS region, (a) is the distribution of the 
positron energy; (b) is the distribution of the energy of the neutron captured 
by Gd; (c) is the distribution of the distance between the positron and 
neutron positions; (d) is the distribution of the delay time of the neutron 
signal. 



